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Abstract: This article is intended to assist teachers and researchers in designing 
studies that examine the efficacy of a particular intervention or strategy with 
students with sensory disabilities. Ten research designs that can establish causal 
inference (the ability to attribute any effects to the intervention) with and without 
randomization are discussed. 


Since the passage of the No Child Left 
Behind Act of 2001 (2002) and the sub¬ 
sequent reauthorization of the Individ¬ 
uals with Disabilities Education Act 
(2004), the field of special education 
has placed increasing emphasis on sci¬ 
entifically based research and evidence- 
based practice. The leading journal in 
special education, Exceptional Chil¬ 
dren, devoted an entire issue to speci¬ 
fying quality indicators for various re¬ 
search designs (Brantlinger, 2005; Gersten, 
2005; Homer, 2005; Odom, 2005; Thomp¬ 
son, 2005), as a guide for evaluating re¬ 
search studies and improving the evidence 
base in special education. 

In sensory disabilities, these quality in¬ 
dicators are often difficult to achieve. Be¬ 
cause sensory disabilities are a low- 
prevalence disability—in 2012 to 2013, 
students with visual impairments com¬ 
prised 0.1% of the school-age population 
(ages 3 to 21), while children who were 
deaf and hard of hearing comprised 0.2% 
(Snyder & Dillow, 2015)—conducting 
research is made more difficult not only 


by the small numbers, but by the fact that 
those numbers are dispersed over a huge 
geographic area. Many school districts in 
rural areas enroll no students with sensory 
disabilities; others may enroll only one 
student who has only one form of sensory 
disability. Accessing these students for 
the purpose of research incurs consider¬ 
able expense. In addition, children with 
sensory disabilities are considered a het¬ 
erogeneous population, meaning that two 
children of the same age, diagnosis, and 
grade level may have differing degrees of 
visual impairment or hearing loss that af¬ 
fect their ability to respond to any partic¬ 
ular intervention. The low-prevalence na¬ 
ture of sensory disabilities means that 
replication studies are few and far be¬ 
tween, since the same children cannot be 
used for replication, even with different 
investigators, and because the passage of 
time leads to confounding variables of 
prior exposure and maturation. 

A number of articles have examined 
the level of evidence for various research 
studies in sensory disabilities in order to 
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identify the much-desired evidence-based 
practices. These include Abou-Gareeb, 
Lewallen, Bassett, & Courtright, 2001; 
Botsford (2013); Cawthon and Leppo 
(2013); Ferrell, Buettel, Sebald, and Pear¬ 
son (2006); Ferrell, Dozier, and Monson 
(2011); Ferrell, Mason, Young, and Cooney 
(2006); Graeme et al. (2011); Kelly and 
Smith (2011); Luckner (2006); Luckner and 
Cooke (2010); Luckner and Handley 
(2008); Luckner, Sebald, Cooney, Young, 
and Muir (2005, 2006); Luckner and Ur¬ 
bach (2012); Parker, Davidson, and Banda 
(2007); Parker, Grimmett, and Summers 
(2008); Parker and Ivy (2014); Parker and 
Pogrund (2009); Wang and Williams 
(2014); and Wright, Harris, and Sticken 
(2010). All of these studies have concluded 
that pedagogy in sensory disabilities is char¬ 
acterized by a dearth of scientific evidence. 
A recent analysis conducted for the Collab¬ 
oration for Effective Educator Develop¬ 
ment, Accountability, and Reform (CEE- 
DAR) project at the University of Florida, 
which assists states and institutes of higher 
education to reform their teacher prepara¬ 
tion programs, found that most of the re¬ 
search in visual impairment, deaf education, 
and deafblindness demonstrated low levels 
of evidence (Ferrell, Bruce, & Luckner, 
2014). There were some areas where evi¬ 
dence could be considered emerging, such 
as the body of literature that questions a 
school district’s practice of employing para- 
educators (often referred to as paraprofes- 
sionals) to serve students with visual im¬ 
pairments (Conroy, 2007; Forster & 
Holbrook, 2005; Griffin-Shirley & Mat- 
lock, 2004; Harris, 2011; Koenig & Hol¬ 
brook, 2000; Lewis & McKenzie, 2010; 
Marks, Schrader, & Levine, 1999; Mc¬ 
Kenzie & Lewis, 2008; Russotti & Shaw, 
2001), because it was composed of limited 


intervention studies and several opinion 
pieces. There were also areas that were con¬ 
sidered strong, such as early identification 
and intervention services prior to six 
months of age for infants who are deaf or 
hard of hearing (Joint Committee on Infant 
Hearing, 2007; Meinzen-Derr, Wiley, & 
Choo, 2011; Moeller, 2000; Vlastarakos, 
Proikas, Papacharalampous, Exadaktylou, 
Mochloulis, & Nikolopoulos, 2010; Vohr et 
al., 2008; Yoshinaga-Itano, 2003a, 
2003b; Yoshinaga-Itano & Gravel, 2001; 
Yoshinaga-Itano, Sedey, Coulter, & Mehl, 
1998), which were supported by a number 
of studies that were interventions and rep¬ 
licated by other researchers. Although the 
research base appears to be growing, Fer¬ 
rell et al. (2014) concluded that research 
in sensory disabilities was generally char¬ 
acterized by studies that lacked causal 
inference and provided little evidence for 
the strategies that constitute practice. 

In addition to the dearth of evidence- 
based claims about the effectiveness of 
interventions with persons with sensory 
disabilities, research on interventions de¬ 
rived from theoretical models is also no¬ 
ticeably absent. Moreover, there were 
few, if any, replications or systematic in¬ 
vestigations of any particular class of in¬ 
terventions designed to enhance the 
education of students with sensory dis¬ 
abilities. This lack of such replications or 
investigations does not mean there is not 
a rich body of knowledge about these 
persons; however, there is relatively little 
scientific evidence concerning the effi¬ 
cacy of interventions that promote learn¬ 
ing and development in children with sen¬ 
sory disabilities. 

This article describes strategies for de¬ 
signing investigations that strengthen 
causal claims about interventions in- 
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tended for individuals with sensory 
disabilities. First, we describe tactics for 
implementing what is considered by 
many to be the strongest design for draw¬ 
ing causal inferences about interventions: 
a design that uses a pretest, posttest, and 
random assignment of study participants 
to experimental and control groups. Be¬ 
cause this particular design, although con¬ 
sidered the “gold standard” in educational 
research (Sullivan, 2011), is not always 
possible with sensory disability popula¬ 
tions, and it is not even particularly 
relevant (Cronbach, 1982), we next de¬ 
scribe some enhancements to quasi- 
experimental designs where the partic¬ 
ipants are not randomly assigned to 
experimental and control groups, but 
which strengthen causal inferences. The 
regression discontinuity design, for ex¬ 
ample, has much to offer researchers 
who work with individuals with sensory 
disabilities; however, the Ferrell (2006) 
and Luckner (2006) analyses did not 
identify a single study using this design. 
Hence, a third objective is to introduce 
the regression discontinuity design as a 
useful technique for constructing causal 
claims about intervention strategies. 
Fourth, we explore the use of two de¬ 
signs in which participants serve as 
their own control group: the single¬ 
factor within-subject design and the 
time-series design. Although single¬ 
case designs were not among the de¬ 
signs identified by Valentine and Coo¬ 
per (2004) as meeting evidence 
standards for making causal claims at 
the time of the Ferrell and Luckner re¬ 
views, current evidence standards (Val¬ 
entine & Cooper, 2008; What Works 
Clearinghouse, 2014) offer guidelines 
for constructing causal claims from 


single-case designs. Nevertheless, there 
are other research designs that can be 
used to demonstrate the efficacy of an 
intervention. 

Questions in 
research methodology 

It has become all too fashionable to frame 
debates in education around research 
methodology. Instead, we believe, as do 
Bartlett et al. (2006), that it is more pro¬ 
ductive to think about the selection of a 
research methodology in relation to re¬ 
search questions and the interrelation¬ 
ships among the questions. Shavelson, 
Towne, and the Committee on Scientific 
Principles for Education Research (2002), 
for example, suggested that research in 
education, social sciences, and natural 
sciences takes one of three forms: (1) 
understanding what is happening; (2) de¬ 
tecting a causal relationship; and (3) ex¬ 
plaining a causal relationship. Viewed 
this way, it makes little sense to quibble 
about whether one should use qualitative 
or quantitative methodology until one has 
given careful thought to the questions one 
seeks to answer. 

Understanding what is happening 
To understand what is happening, re¬ 
searchers need to use the tools that pro¬ 
duce rich descriptions of persons, their 
environments, and their interactions, as 
well as those that provide accurate quan¬ 
titative estimates of important character¬ 
istics. On the one hand, researchers might 
seek to understand the experiences of per¬ 
sons with sensory disabilities in segre¬ 
gated educational settings and how they 
are different from those in inclusive set¬ 
tings. Descriptions emerging from inter¬ 
views and ethnographic studies would be 
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essential in accumulating information 
related to answering this question. On the 
other hand, investigators might seek to 
describe the characteristics of the popu¬ 
lation being served in the different set¬ 
tings based on gender, ethnicity, age, se¬ 
verity of disability, literacy level, and so 
on. Accurate descriptions of this nature 
require appropriate sampling methods, as 
well as descriptive and inferential statis¬ 
tical methods. Methods that estimate the 
magnitude of relationships among mea¬ 
sured or inferred characteristics may also 
provide information that is useful for un¬ 
derstanding what is happening. 

Detecting a causal relationship 
When the focus of research questions 
turns to ascertaining causal effects, exper¬ 
iments that use random assignment to 
form treatment and control groups pro¬ 
vide the most logically defensible no¬ 
cause baseline against which to estimate 
the effects of an intervention (Cook, 
2002). Although experimental designs 
that use random assignment provide this 
baseline, it is important to acknowledge 
there are instances in which randomized 
experiments are simply not possible or 
may be premature. It may be important to 
know, for example, whether the magni¬ 
tude of treatment effects varies with the 
severity of the sensory impairment. Sen¬ 
sory impairment, however, is a preexist¬ 
ing condition that is not amenable to ran¬ 
dom assignment. One must rely on 
analytical (for instance, structural equa¬ 
tion modeling) and logical methods for 
making causal claims (see Thompson, Di¬ 
amond, McWilliam, Snyder, & Snyder, 
2005, for further discussion). 

Finally, it should be noted that descrip¬ 
tive studies that address the question of 


what is happening also play a key role in 
establishing the internal validity of exper¬ 
imental research. Descriptive research ad¬ 
dresses the issues of what was actually 
happening in the implementation of the 
intervention. Questions related to the fi¬ 
delity of the implementation of an inter¬ 
vention in experimental research play a 
key role in answering questions that seek 
to explain causal relationships. 

Explaining a causal relationship 
When consistent evidence of a causal re¬ 
lationship is found in randomized exper¬ 
iments, the focus shifts to questions that 
seek to explain the causal process. If there 
are differences in literacy development in 
students educated in inclusive settings 
relative to self-contained settings, re¬ 
searchers want to know why that is so. In 
this instance, descriptive research plays a 
critical role in generating potential expla¬ 
nations for hypothesized causal relation¬ 
ships. Descriptions of the interactions 
among people in self-contained and inclu¬ 
sive settings, or studies of the way in 
which cognitive processes develop in the 
two settings through studies that enable 
learners to externalize their cognitive pro¬ 
cesses as they attempt to read and write, 
are likely to provide insights into strate¬ 
gies that are key processes leading to the 
development of literacy. Once the cogni¬ 
tive processes are clearly defined through 
descriptive studies, interventions may be 
designed around them and their effective¬ 
ness evaluated in controlled experiments. 

Overcoming barriers 
to causal inference 
Designing and implementing the sim¬ 
plest of research designs that support 
causal inference is a daunting task. 
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Random 

assignment 


Figure 1. Randomized experiment with pre¬ 
test and posttest. 

Thus, the purpose of an investigator is 
not to find fault with the published re¬ 
search, but to encourage continuing re¬ 
finements in the research that is con¬ 
ducted to find ways that provide 
incontrovertible evidence of efficacy. 
There are two basic research designs 
that meet evidence standards for causal 
validity: randomized experiments and 
regression discontinuity designs. Quasi¬ 
experiments with equivalent groups 
meet evidence standards with reserva¬ 
tions. [The design standards for both 
regression discontinuity designs and 
single-case designs are still under re¬ 
view and are considered pilots (What 
Works Clearinghouse, 2014).] Quasi- 
experimental designs that can establish 
causal validity include nonequivalent 
groups with multiple pretests, single-factor 
within-subject designs, and single-case de¬ 
signs, all described below. 

Randomized experiments 
Components of the randomized experi¬ 
ments that support claims about causal 
validity are shown in Figure 1. The most 
crucial component of this research design 
is the random assignment of study partic¬ 
ipants to the treatment and no-treatment 
control conditions. Random assignment 
of participants to treatment and control 
groups is the best method for equalizing 
any differences between two groups of 
participants at the outset of the study. 


Thus, the group that does not receive the 
intervention serves as the most logically 
defensible no-cause baseline for evaluat¬ 
ing the effects of the intervention. Al¬ 
though it is not necessary to include a 
pretest, including one can be used to in¬ 
crease the power of the statistical test to 
detect the presence of treatment effects 
when treated as a covariate by reducing 
error variance. 

The What Works Clearinghouse (2014) 
design standards require a minimum of 
350 individuals for a randomized 
controlled trial, which is largely unat¬ 
tainable in sensory disability fields. 
Flowever, an obvious objection to ran¬ 
domized experiments is that a potentially 
beneficial intervention needs to be with¬ 
held from some participants. Even though 
the efficacy of the intervention is not 
known and in fact could be detrimental, 
and even though a lottery is the fairest 
way of making the assignment, the use of 
such designs is questioned on ethical 
grounds (Cook & Campbell, 1979). In¬ 
deed, there are many other philosophical 
and practical objections to the use of ran¬ 
domized experiments (Cook, 2002). Be¬ 
cause quasi-experiments involve with¬ 
holding an intervention, and because they 
do not provide strong evidence for causal 
validity, it is difficult to justify choosing 
quasi-experiments over randomized 
experiments. 

One strategy for overcoming objec¬ 
tions to withholding an intervention that 
is perceived as valuable is to use a wait¬ 
list design that simply delays the intro¬ 
duction of the intervention for one group 
of participants. An example of this design is 
shown in Figure 2. The design uses random 
assignment to place study participants into 
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Figure 2. Randomized wait-list control group experiment with pretest and multiple posttests. 


either the immediate (Phase I) or delayed 
(Phase II) intervention conditions and uses 
multiple posttests. 

Another strategy for overcoming objec¬ 
tions to withholding treatment is to actu¬ 
ally conduct two studies within a study. 
Atkinson (1968) randomly assigned stu¬ 
dents to receive computer-assisted in¬ 
struction either in reading or in mathe¬ 
matics. Both groups completed pretests 
and posttests in both subjects and, later, 
the interventions were switched. The 
group that was receiving computer- 
assisted instruction in reading was 
switched to receive computer-assisted in¬ 
struction in mathematics, whereas the stu¬ 
dents originally receiving computer- 
assisted instruction in mathematics were 
switched to receive computer-assisted in¬ 
struction in reading. After receiving the 
second treatment in Phase II, both groups 
completed a second posttest in both sub¬ 
jects. Figure 3 depicts this multiple treat¬ 
ment design. In this design, each group 
serves as a control (the no-cause baseline) 
for the other group in order to estimate the 
impact of the interventions. The key to 
using this design is that the two interven¬ 
tions are independent in terms of their 
impact on performance in the domains 


being assessed. (Although students’ read¬ 
ing and mathematics achievement are 
likely to be correlated, it is not likely that 
instruction in mathematics will affect 
reading, or that reading instruction will 
impact mathematical reasoning.) Insofar 
as both interventions are perceived as 
valuable, the design overcomes objec¬ 
tions to withholding treatment even when 
only the first phase is completed. Preci¬ 
sion in the estimation of the magnitude of 
the effect of treatment can be increased in 
the analysis by using students’ pretest 
scores in reading and mathematics as co¬ 
variates in an analysis of covariance 
model. 

Randomized experiments with planned 
variation designs 

A variant of the multiple treatment design 
is the planned variation design (Rivlin & 
Timpane, 1975), although it, too, is sub¬ 
ject to the minimum 350 participants by 
design standards (What Works Clearing¬ 
house, 2014). Continuing with the exam¬ 
ple above, an investigator may vary the 
level or amount of the intervention strat¬ 
egy. For example, the investigator may 
fix the amount of instructional time to 
20 minutes on one, three, or five days a 
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Figure 3. Randomized experiment with multiple treatments, pretest, and multiple posttests. 
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Figure 4. Randomized experiment with planned variation of intervention. 


week (an approach reminiscent of the 
dose-response design often used in med¬ 
ical research). In the absence of a control 
group, however, this design only gives an 
estimate of the relative impact of an in¬ 
tervention. Adding a control group that 
receives a valued intervention that does 
not impact the outcome of interest pro¬ 
vides the necessary no-cause baseline to 
evaluate the absolute effect (Cook & 
Campbell, 1979). The planned variation 
design is depicted in Figure 4. Notice that 
a control group receiving an intervention 
for mathematics is included. Insofar as 
the mathematics intervention is not ex¬ 
pected to impact the reading outcome 
measure, it may be considered a no-cause 
baseline to judge the effects of varying 
levels of treatment. 

Quasi-experiments 

Nonequivalent group designs. Research 
designs based on intact groups of partic¬ 
ipants to form experimental and control 
groups were by far the most frequently 


used designs for research on literacy 
among individuals with sensory disabili¬ 
ties. Figure 5 depicts the nonequivalent 
group design that is encountered most 
often. The design usually includes an un¬ 
treated control group and pretest-posttest 
measures. Because the process of forming 
the groups is unknown and participants 
cannot be matched on every characteristic 
that may affect outcomes, there are many 
potential differences between the two 
groups that could appear as intervention 
effects even when investigators take ex¬ 
traordinary steps to match the two groups 
and randomly assign the groups to exper¬ 
imental and control conditions. Depend¬ 
ing on the outcome of the experiment, the 
results may or may not be interpretable 
(Cook & Campbell, 1979; Shadish, Cook, 
& Campbell, 2002). The four major threats to 
causal inferences that one must eliminate in 
this design include: selection-maturation, in¬ 
strumentation, differential statistical regres¬ 
sion, and local history. 
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Figure 5. Nonequivalent groups—quasi-experiments with pretests and posttests—that are most 
often encountered. 


©2015 AFB, All Rights Reserved Journal of Visual Impairment & Blindness, November-December 2015 475 





























Figure 6. Hypothetical results of an experi¬ 
ment with multiple pretests. 

Nonequivalent groups with multiple 
pretests. The interpretation of non¬ 
equivalent designs with a control group 
can be enhanced considerably by adding 
one or more pretests to the design, as 
shown in Figure 6. Including an addi¬ 
tional pretest can be useful for eliminat¬ 
ing threats to causal inferences based on 
the factors of selection and maturation 
and statistical regression. If the two 
groups exhibit similar levels of growth 
between the two pretests, this would elim¬ 
inate the rival hypothesis of differential 
maturation between the two groups. Con¬ 
sider the hypothetical results in Figure 7 
compared to the design in Figure 6. If the 
second pretest was the only pretest, one 
could argue that the experimental group 
was already at an advantage and that its 
superior performance on the posttest re¬ 
flected the earlier advantage and had 
nothing to do with the intervention. The 
additional pretest, however, leads to a dif¬ 
ferent interpretation. Although the exper¬ 
imental group exhibits a slight advantage 


in reading achievement on the first pre¬ 
test, both the experimental and control 
groups are developing at approximately 
the same rates between the two pretests. 
Thus, differential maturation can be ruled 
out as an explanation for the differences 
between the two groups on the posttest, 
although it is necessary to account for the 
differences between the two groups on 
the pretest measures as well as their 
correlation. 

Regression discontinuity designs. Given 
the simplicity of this design and its ability 
to control threats to internal validity, it is 
surprising that it is not used more often 
(the Ferrell, 2006, and Luckner, 2006, 
studies did not identify one study using a 
regression discontinuity design). Suppose 
that an individual has administered a 
reading assessment at the beginning and 
end of the school term. For a skill like 
reading, it would not be unreasonable to 
observe a moderate to high positive cor¬ 
relation between the two sets of scores 
like that exhibited in Figure 8. Further, 
suppose that a researcher decides to as¬ 
sign all students with reading scores 
below a certain cutoff (the mean, for 
example), to receive a literacy interven¬ 
tion. If the intervention were effective, 
how would it affect the expected rela¬ 
tionship between pretest and posttest 
scores? Figure 9 shows what a simple 
main effect of intervention would look 
like. If the treatment were effective, there 
would be a discontinuity in the regression 
line describing the relationship between 
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Figure 7. Nonequivalent groups, quasi-experiment, with multiple pretests and posttest. 
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Figure 8. Hypothetical effects of intervention on the relationship between students’ pretest and 
posttest reading assessment scores. 


pretest and posttest scores. Lord and 
Novick (1968) show that the discontinuity 
in the regression line is an unbiased esti¬ 
mate of the intervention effect if the 
basic requirements of the design can be 
satisfied: 

1. If, and only if, the assignment of par¬ 
ticipants is based on the cut-off score; 

2. The cut-off score is the mean of the 
distribution; and 

3. The functional form of the relationship 
between the assignment variable and 
the dependent variable is known. 



Reading pretest 

Figure 9. Hypothetical effects of intervention 
on the relationship between students’ pretest 
and posttest reading assessment scores. 


More information on regression disconti¬ 
nuity designs can be obtained from the 
Institute of Education Sciences’ online 
Technical Methods Report (2008). 
Single-factor within-subject designs. In 
some cases, evidence for causal validity 
can be demonstrated using a design 
where participants serve as their own 
control group by participating in all 
conditions of the experiment (Keppel & 
Zedeck, 1989). In Figure 10, partici¬ 
pants are randomly assigned to receive 
both interventions, but in two different 
orders. Wauters (2001), for example, 
investigated the question of whether 
deaf students (mean hearing loss 104 
dB) recognized written words learned 
through sign (Dutch sign language) and 
spoken language more effectively than 
when written words were learned solely 
through speech. Participants were trained in 
both conditions in different orders and with 
different word lists. Both the order of inter¬ 
ventions and word lists were counterbal¬ 
anced to control for practice effects and 
word list effects. In the absence of counter¬ 
balancing, any differences between the two 
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Figure 10. Randomized within-subjects experiment. 


learning conditions could be attributed to 
their order or to some unknown differences 
between the word lists. Wauters (2001) 
went one step further to include a third word 
list that was not part of training. Thus, the 
two learning conditions could be evaluated 
in absolute terms as well as in relative 
terms. 

There is clearly one condition, how¬ 
ever, where the single-factor within- 
subject design would not be considered 
appropriate. When an intervention has a 
lingering or carryover effect, it is possible 
that it could interact with one of the other 
conditions to produce a differential carry¬ 
over effect. In this case, counterbalancing 
does not control this threat to causal va¬ 
lidity. For example, suppose a researcher 
is investigating the effects of a reading 
comprehension strategy (A) relative to 
the students’ usual reading strategies (B). 
It is unlikely that students who receive the 
reading comprehension strategy (A) first 
will revert completely to their usual read¬ 
ing habits (B), even when told to do so. 
Thus, the reading performance under the 
two conditions is not a fair comparison, 
because students who receive the AB or¬ 
der are likely to do better under B condi¬ 
tions than students who receive the BA 
order. That is, intervention A carries over 
to intervention B, but intervention B does 
not carry over to intervention A. 
Single-case designs. Another category of 
research designs where participants serve 
as their own control includes single-case 


designs (Kazdin, 1982). Although there 
are many variations of single-case de¬ 
signs, they share the property of repeated 
measurements of the outcome variable 
across different conditions, such as before 
and after the introduction of an interven¬ 
tion or across alternating interventions 
(an intervention involving changing cri¬ 
teria for performance). Single-case de¬ 
signs offer considerable flexibility be¬ 
cause the units of analysis may consist of 
an individual, several individuals, dyads, 
small groups, classrooms, or even institu¬ 
tions (Levin, O’Donnell, & Kratochwill, 
2003), and when implemented properly, 
single-case designs can provide strong 
evidence for causal validity (Kratochwill 
et al., 2010; Kratochwill & Levin, 2010), 
although generalizability to the larger 
population is still limited because the 
sample size is so small. 

In this paper we focus on the multiple- 
baseline design. It is a flexible single-case 
design that some argue offers the stron¬ 
gest support for making causal inferences 
about the effectiveness of interventions 
(Kratochwill & Levin, 2010). Figure 11 is 
a schematic representation of one possi¬ 
ble multiple-baseline design. There are 
several prominent features of the multiple- 
baseline design that are crucial to 
strengthening internal validity. Its sim¬ 
plicity is that it involves only two phases 
per case, a baseline phase (A) and an 
intervention phase (B). A simple AB de¬ 
sign with a single case would not control 
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3 AAAAAAAABBBBBBB 

2 AAAAAAAAAABBBBB 
Note: A = baseline phase; B = intervention phase. 

Figure 11. Schematic representation of the multiple-baseline design. 


threats to internal validity or support gen¬ 
eralization and does not meet evidence 
standards for making causal claims about 
an intervention. Notice, however, that the 
AB phases are replicated across three 
units or cases (for instance, students). 
Replication of an intervention effect 
across cases is a critical requirement for 
demonstrating experimental control, but 
it is not sufficient. If the introduction of 
the intervention occurs at the same time 
for each case, many threats to internal 
validity, such as concomitant events, are 
not controlled. Staggering the introduc¬ 
tion of the intervention at different points 
in time for each individual, however, con¬ 
trols many threats to internal validity. If 
the outcome measure changes when, and 
only when, the intervention is introduced, 
the researcher has evidence of a func¬ 
tional or causal relationship between the 
intervention and the outcome measure. 

Additional control over threats to inter¬ 
nal validity can be introduced to the 
multiple-baseline design via two random¬ 
ization schemes (Kratochwill & Levin, 
2010). One way to introduce randomiza¬ 
tion into this design is to assign partici¬ 
pants at random to the staggered interven¬ 
tion conditions. Additionally, the 
investigator could assign the timing of the 
intervention for each case at random. 
Random assignment, however, leaves 
open the possibility of having only a sin¬ 
gle baseline or post-intervention observa¬ 


tion and identical starting points across 
cases. An alternative approach is the reg¬ 
ulated randomization procedure (Koehler 
& Levin, 1998). 

In regulated randomization, the re¬ 
searcher specifies the minimum number 
of observations required for the baseline 
and intervention phases of the study by 
considering the total number of obser¬ 
vations available. In Figure 11, a deci¬ 
sion was made to have a minimum of 
three baseline observations with a min¬ 
imum of five post-intervention observa¬ 
tions. Given the 15 available measure¬ 
ment intervals, the intervention could 
be introduced prior to the fourth obser¬ 
vation but no later than after the tenth 
observation. Accordingly, there are 
seven possible intervention points that 
could be selected at random. To main¬ 
tain clear temporal separation of the 
introduction of the intervention across 
cases, regulated randomization allows 
the researcher to specify a range of 
starting points for the intervention such 
that there is no overlap in starting points 
across cases (Koehler & Levin, 1998). 
With 15 observation periods available, 
the researcher could randomly choose 
to start the intervention prior to the 
fourth or fifth observation for one case. 
For the next case, the researcher would 
use random selection to start the inter¬ 
vention prior to the sixth or seventh 
observation, and so on for the remaining 
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two cases. The outcome of the random 
assignment of starting points for the 
intervention and random assignment of 
students to starting points is depicted in 
Figure 11. If there are more than four 
cases, students are randomly assigned 
to one of the four starting points. 

In sum, the replication of occasions to 
observe intervention effects, the temporal 
separation of starting points of the inter¬ 
vention across replications, the use of ran¬ 
dom assignment to structure starting 
points, and the random assignment of 
cases to starting points strengthen the in¬ 
ternal validity of the multiple-baseline de¬ 
sign and any resulting claims about a 
causal relationship between the interven¬ 
tion and the outcome measure of interest. 

To meet evidence standards, a multiple- 
baseline design (and other single-case de¬ 
signs) must meet four criteria (Kratochwill 
et al., 2010). First, the researcher must 
document control over how and when the 
intervention is implemented. Second, it is 
necessary to establish a high standard of 
interobserver agreement for the measure¬ 
ment of the outcome measure (greater 
than 80%). Interobserver agreement must 
be calculated for each case on 20% of the 
observations in each phase (baseline and 
intervention). Third, a multiple-baseline 
design must include at least three replica¬ 
tions of the intervention at three different 
points in time. Fourth, each phase must 
include at least live data points. To meet 
evidence standards with reservations, 
each phase must include at least three data 
points. The design in Figure 11 meets all 
of the criteria except for the last one, 
because one case has fewer than five ob¬ 
servations in one baseline phase. Accord¬ 
ingly, this multiple-baseline design meets 
evidence standards with reservations. In¬ 


creasing the number of observation peri¬ 
ods to 18 would enable the researcher to 
meet the criterion of at least five obser¬ 
vations per phase. 

To meet these criteria, a study must 
provide at least three demonstrations of 
experimental control over the outcome 
variable (an effect) to claim “strong evi¬ 
dence” of a causal relationship. Other¬ 
wise, the study provides “no evidence” to 
support the claim of a causal relationship. 
If a study provides three demonstrations 
of an effect and at least one instance of no 
effect, then there is “moderate evidence” 
for a causal relationship (Kratochwill et 
al., 2010). Additionally, interpretation of 
the results of a multiple-baseline design 
must be considered in the context of 
whether the interventions are defined in 
terms of meaningful theoretical con¬ 
structs (for example, scaffolding, peer- 
tutoring, comprehension monitoring, etc.) 
and whether the intervention was imple¬ 
mented as intended. 

While there is more widespread agree¬ 
ment around the procedures necessary to 
control threats to the internal validity of 
the multiple-baseline design, there is con¬ 
siderable debate about the methods for 
detecting the presence of intervention ef¬ 
fects in single-case designs. Visual anal¬ 
ysis (Hersen & Barlow, 1976; Kennedy, 
2005; Kratochwill et al., 2010); percent¬ 
age of nonoverlapping data (Scruggs & 
Mastropieri, 1998, 2013); regression 
analysis, including hierarchical growth 
models (Raudenbush & Bryk, 2002, 
Singer & Willet, 2003); and randomiza¬ 
tion tests (Edgington, 1975, 1992; 

Koehler & Levin, 1998; Levin & 
Wampold, 1999) are three of the primary 
methods proposed for analyzing time- 
series data arising from single-case 
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designs to detect intervention effects. Al¬ 
though a detailed discussion of these top¬ 
ics is beyond the scope of this article, it is 
important to note that regardless of one’s 
analytical approach there are no clear 
guidelines about how to integrate findings 
from single-case designs with studies us¬ 
ing more traditional group comparison 
designs. 

Despite the controversy over estimat¬ 
ing effect-size estimates from single-case 
designs, the advantages of a well- 
controlled single-case design outweigh 
those of between-group designs, where 
threats to internal validity are not con¬ 
trolled. Investigators are encouraged to 
publish the raw data that will enable re¬ 
viewers to conduct reanalysis of data in 
the light of new developments. Addition¬ 
ally, effect size estimates from single¬ 
case designs can be considered separately 
from the effect sizes obtained in tradi¬ 
tional between-group designs to identify 
what works. Wendel, Cawthon, Ge, and 
Beretvas (2015) provide a useful example 
of a review of studies on individuals who 
are deaf or hard-of-hearing. As Kratoch- 
will et al. (2010) suggest, the rank order¬ 
ing of interventions from within-subjects 
studies and those from between-subjects 
studies are likely to be similar despite the 
present incomparability of their effect- 
size estimates. 

Conclusion 

In sum, the research designs we have fo¬ 
cused on in this article are by no means 
novel. However, each design offers useful 
strategies to support causal claims about 
the effectiveness of interventions with in¬ 
dividuals who have sensory disabilities. 
Undoubtedly, describing strategies for de¬ 
signing research that support causal 


claims about the effectiveness of inter¬ 
ventions among individuals with sensory 
disabilities is far easier than conducting 
the studies in the myriad settings in which 
researchers and teachers work to support 
them. Progress in the field, finding what 
works, depends on the ability of the mem¬ 
bers of the field to make logically defen¬ 
sible arguments for making causal claims 
about the effectiveness of interventions 
and the basis for estimating the magni¬ 
tude of their effects. 

We have written this article because we 
hope to inspire a new generation of re¬ 
searchers to think more creatively about 
the use of research designs that can 
strengthen evidence for drawing causal 
inferences about interventions that pro¬ 
mote learning and development among 
individuals with sensory disabilities. Our 
reviews have highlighted several difficul¬ 
ties in conducting research with low- 
prevalence populations, but hopefully 
they have also suggested methods that 
can strengthen research in the future. We 
offer these recommendations to new and 
experienced researchers, as well as to 
teachers conducting action research with 
their students: 

1. Research questions should influence 
the selection of methodology. 

2. Researchers should think creatively 
about how to design research that sup¬ 
ports causal validity—and thereby 
eliminates rival hypotheses. 

3. Questions about causal relationships, 
however, require qualitative descrip¬ 
tions about the nature and fidelity of 
implementation of interventions. 

4. Descriptive statistics that enable other 
investigators to replicate analyses 
and synthesize findings of a study, 
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particularly in relation to other stud¬ 
ies examining the same constructs, 
must be reported. Although journals 
may request cuts in the manuscript, 
we believe that these statistics are 
foundational for a study and vital to 
its interpretation and application by 
others. 

5. Maximize power by including cova¬ 
riates that reduce error variance in 
hypothesis testing (see also Wright, 
2010 ). 

6. Calculate effect size estimates and 
confidence intervals for primary study 
outcomes (Fitz, Morris, & Richler, 
2012; Kim, 2015; Peng, Long, & 
Abaci, 2012). 

Many of these recommendations reflect 
basic principles of research design. How¬ 
ever, in the effort to conduct research in 
low-prevalence fields hampered by popu¬ 
lations with small numbers, extreme het¬ 
erogeneity, and wide geographical disper¬ 
sion, recognizing what works—and what 
does not—is often difficult. We hope 
these recommendations will focus the 
field on potential solutions. 
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